This notebook offers a walk-through to show how the market regime label works and offers Visualization of results to tune the parameters for your dataset.¶

Methods and Visulization included¶

  1. Band-pass filter
  2. Deterministic trend(linear model)
  3. Cross-validation of 2 with clustering
  4. TSNE plot of 2 and 3
  5. Length-constrained deterministic trend
  6. Stock clustering

Data download and preprocessing¶

[*********************100%***********************]  29 of 29 completed

Data descrption and Basic manipulation¶

band-pass filter¶

Min_Max normalization¶

Deterministic Trends(linear model)¶

The low and high threshold for regime classification is adjustable.A value of -0.5 and 0.5 stand for -0.5% and 0.5% change per step.

The following sets of plots are (raw data, deterministic trend with regimes labeled, raw data with regimes labeled) of each stock¶

Cross validation¶

We use multiple cluster methods to cross validate how our deterministic trend method

dwt cluster¶

Counter({1: 1158, 0: 1771, 2: 2362, 5: 155, 3: 101, 4: 8})
[Counter({0: 690, 5: 154, 2: 92, 4: 8}),
 Counter({2: 687, 0: 1071, 5: 1}),
 Counter({2: 1393, 1: 411, 0: 9}),
 Counter({1: 747, 2: 190, 3: 101, 0: 1})]
The major cluster of regime 0 is 0. 0.73 of this regime is in this cluster. 0.39 of this laebl is in this regime
The major cluster of regime 1 is 0. 0.61 of this regime is in this cluster. 0.6 of this laebl is in this regime
The major cluster of regime 2 is 2. 0.77 of this regime is in this cluster. 0.59 of this laebl is in this regime
The major cluster of regime 3 is 1. 0.72 of this regime is in this cluster. 0.65 of this laebl is in this regime

KNN¶

KNN is less convincing in this example, but you can try it on your dataset

T-SNE¶

We regard each time-serie as an vector of a single dim and stack all vetors as multiple dims

Interpolate the normalizaed adjcp/pct_return

TSNE of linear model

TSNE of DWT

TSNE of KNN

length constrained linear model¶

In use cases, we usually set constrain on the minimum length of a regime series, this is the post-processing process of the regime series to meet minimum length

0
count 2220.000000
mean 0.079565
std 0.320038
min -3.095165
25% -0.092169
50% 0.085719
75% 0.257236
max 2.339544

Stock grouping based on similarity¶

We cluster the stock into groups based on their adjclose

[['CSCO', 'INTC', 'KO', 'MRK', 'VZ', 'WBA'],
 ['GS', 'HD', 'UNH'],
 ['AMGN', 'CAT', 'CRM', 'HON', 'MCD', 'MMM', 'MSFT', 'V'],
 ['IBM'],
 ['BA'],
 ['AAPL', 'AXP', 'CVX', 'DIS', 'JNJ', 'JPM', 'NKE', 'PG', 'TRV', 'WMT']]
This is cluster 0
This is cluster 1
This is cluster 2
This is cluster 3
This is cluster 4
This is cluster 5